Add pure julia exp function #19831

musm · 2017-01-03T05:58:55Z

~~Need to add benchmarks~~, but it is faster or the same speed as the openlibm version between 1.0 (same speed) to 1.15 to to 1.3 to 1.5 times faster depending on which branch you test. (Testing using BenchmarkTools). On most branches it's faster, I think there is only one branch where it's a basically the same speed.

Accuracy is basically the same (both based on msun version)

musm · 2017-01-03T06:00:41Z

base/special/exp.jl

+function exp{T<:Union{Float32,Float64}}(x::T)
+    xu = reinterpret(Unsigned, x)
+    xs = xu & ~sign_mask(T)
+    xsb = Int(xu >> Unsigned(sizeof(T)*8 - 1))


Probably just as efficient to use signbit, maybe I should use it instead

ya, would be clearer

musm · 2017-01-03T06:00:56Z

base/math.jl


+using Base: sign_mask, exponent_mask, exponent_one, exponent_bias,


No need to import any of these since we don't extend them

musm · 2017-01-03T10:02:20Z

i686 error is the usual failing error and is unrelated

stevengj · 2017-01-03T15:41:22Z

base/math.jl

@@ -70,11 +71,23 @@ end

 # evaluate p[1] + x * (p[2] + x * (....)), i.e. a polynomial via Horner's rule
 macro horner(x, p...)
+    @gensym val


Why is this necessary? Won't the macro hygiene automatically convert t into a unique name?

Yes, good point. I think this was an issue in an earlier version of Julia, but in 0.6 converts it into a unique name.

JeffBezanson · 2017-01-03T15:43:19Z

base/special/exp.jl

+MINEXP(::Type{Float32}) = -103.97207708f0            # log 2^-150
+
+# coefficients from: lib/msun/src/e_exp.c
+@inline exp_kernel{T<:Float64}(x::T) = @horner_oftype(x, 1.66666666666666019037e-1,


x::Float64 ?

stevengj · 2017-01-03T15:44:04Z

base/math.jl

+
+# evaluate p[1] + x * (p[2] + x * (....)), i.e. a polynomial via Horner's rule
+# and convert coefficients to same type as x
+macro horner_oftype(x, p...)


Why is this macro needed? If you change the floating-point type, you generally will change the degree of the polynomial as well, and you can just use literals of the correct type, no? e.g. it looks like you only use it for Float64 and Float32 below, and Julia has a compact syntax for Float32 literals (append f0).

I agree that it will generally change the degree of the polynomial. That being said, for example one case where it doesn't really change is if you also want to have native Float16 support (although on CPUs this is useless) the dispatch on the same polynomial (the one for ::Float32) works well in practice (from many cases on different function I found it's really hard to find good coefficients by reducing the degree from the Float32 version for Float16). For example this function will work just as well within accuracy for Float16. Thus it's helpful to define this macro to take care of such dispatch cases.

I would rather not add the extra macro for an odd case (it is unusual that you wouldn't be able to reduce the degree for Float16) that we don't really implement (probably on CPUs we will just compute exp(x::Float16) by converting to/from Float32, no?).

I would rather not add the extra macro for an odd case (it is unusual that you wouldn't be able to reduce the degree for Float16) that we don't really implement

Not really unusual for Float16 (as opposed to going from Float128 -> Float64). You have a lot less precision so it's very hard to find an even smaller degree polynomial that retains the same accuracy constraint ( < 1 ulp). This is the hardest/time consuming part testing accuracy and the polynomial coeffs.

(probably` on CPUs we will just compute exp(x::Float16) by converting to/from Float32, no).

yes

I will remove anyways

stevengj · 2017-01-03T15:44:49Z

base/special/exp.jl

+MINEXP(::Type{Float32}) = -103.97207708f0            # log 2^-150
+
+# coefficients from: lib/msun/src/e_exp.c
+@inline exp_kernel{T<:Float64}(x::T) = @horner_oftype(x, 1.66666666666666019037e-1,


Why not just exp_kernel(x::Float64)?

Originally my motivation was to have two or more polynomials depending on the float type and use dispatch as follows:
- A low degree polynomial for Float16 and Float32
- A higher degree polynomial for FLoat64 and other Floats like Float128

so in: https://github.com/JuliaMath/Amal.jl/blob/master/src/exp.jl

I have the two polynomial kernels dispatch the following types

SmallFloat = Union{Float16,Float32} LargeFloat = Float64 # perhaps Union{Float64,Float128} in the future

That grand goal might be a pipedream. So if you all prefer I can restrict the types instead of using this and the @horner_oftype machinery. (although it does have some elegance to have this this current machinery with horner_oftype and the polynomial dispatch on the type)

I would prefer eliminating the extra macro for now.

(Wouldn't the Float16 computations be faster in Float32 arithmetic anyway? And wouldn't we need a higher degree for Float128 than for Float64?)

Yep it would be more efficient on the CPU. The motivation was for GPU code where you do have native Float16 hardware and there is a real benefit (there is some very nice work coming along in having better Float16 support). For Float128 I have not tested yet, but I suspect so.

I guess I can revisit when the time comes and restrict the polynomial dispatch and the additional macro.

stevengj · 2017-01-03T15:48:24Z

base/special/exp.jl

+    # Float64, which is well within the allowable error. however,
+    # 2.718281828459045 is closer to the true value so we prefer that answer,
+    # given that 1.0 is such an important argument value.
+    if x == T(1.0) && T == Float64


Maybe x == T(1.0) && return T(2.718281828459045235360) so that we get the exactly rounded result for Float32 too?

Float32 is exactly rounded without this. So it's better to do this to remove the extra check for the Float32 version.

What's the difference between T==Float64 and T===Float64 I'm leaning towards using the later but for no reason

stevengj · 2017-01-03T15:52:13Z

base/special/exp.jl

+            end
+        else
+            n = round(T(LOG2E)*x)
+            k = unsafe_trunc(Int,n)


It seems bad that this will behave differently on 32-bit and 64-bit machines?

No it doesn't matter in this case. k is the integer representing the exponent of the float, so there is no way it will cause issues since the number will overflow way past when the Int32 will. We could all the same just use Int32 but I think it might be nicer to just use the machine size Int ?

Better to use Int32, I think. (We could even use Int16, it sounds like, but there's probably no advantage to that.)

I made the change, but I don't fully understand why we should prefer Int32 over Int. I tried both and it's not like there is any critical difference in the resulting code.

stevengj · 2017-01-03T15:57:06Z

Would be good to have test cases that exercise each of the branches, including test cases just above and just below the various thresholds like T(0.5)*T(LN2) (since that's usually where the error is maximized).

stevengj · 2017-01-03T16:03:41Z

Out of curiosity, why is it so much faster than the openlibm version? More inlining?

musm · 2017-01-03T16:16:57Z

Would be good to have test cases that exercise each of the branches, including test cases just above and just below the various thresholds like T(0.5)*T(LN2) (since that's usually where the error is maximized).

I have carefully tested this function here: https://github.com/JuliaMath/Amal.jl/tree/master/test
(-10:0.0002:10, -1000:0.001:1000, -120:0.0023:1000, -1000:0.02:2000) (and more dense cases locally) (see https://github.com/JuliaMath/Amal.jl/blob/master/test/accuracy.jl) and have compared it with the Base version.

musm · 2017-01-03T16:20:49Z

Out of curiosity, why is it so much faster than the openlibm version? More inlining?

Hard for me to tell without seeing something like code_llvm or for the openlibm version. It might be a combination of more inlining for some of the constants (openlibm uses an array to store them for some of them). Is there any (even if minor) function overhead to calling C code? These functions run in 6-15ns so even a 1ns overhead could have an impact.

stevengj · 2017-01-03T16:21:40Z

@musm, I don't doubt that you have tested your code, but we should have the key tests in our test suite, at the very least to make sure that we don't get regressions in the future.

musm · 2017-01-03T21:22:39Z

Addressed some of the review comments.

I added a function to float.jl that gives us the corresponding Int size of a Float type.

fpintsize(::Type{Float64}) = UInt64
fpintsize(::Type{Float32}) = UInt32

This would make this and one of the other float function cleaner

reinterpret(T, rem(Int32(32), fpintsize(T))
# vs.  what we use now
xu = reinterpret(Unsigned,x)
reinterpret(T, rem(Int32(32), typeof(xu))

Need to still add tests and benchmarks

stevengj · 2017-01-03T22:04:00Z

base/float.jl

+# integer size of float
+fpintsize(::Type{Float64}) = UInt64
+fpintsize(::Type{Float32}) = UInt32
+fpintsize(::Type{Float16}) = UInt16


fpintsize sounds like it should return a number (64 or 8)... maybe fpint or fpinttype?

fpinttype sounds good to me

musm · 2017-01-03T22:05:50Z

base/float.jl

+@pure exponent_raw_max{T<:AbstractFloat}(::Type{T}) = Int(exponent_mask(T) >> significand_bits(T))
+
+
+


need to get rid of all these additional white spaces :)

stevengj · 2017-01-03T22:12:22Z

base/special/exp.jl

+        -1.65339022054652515390e-6, 4.13813679705723846039e-8)
+
+# coefficients from: lib/msun/src/e_expf.c
+@inline exp_kernel(x::Float32) = @horner(x, 1.6666625440f-1, -2.7667332906f-3)


(This is such a low degree that I wonder if we'd be better off using a minimax polynomial rather than a minimax rational function, at least for the single-precision case — the optimal polynomial would require a higher degree, but it would avoid the division. I don't think we need to address that possibility in this PR, however, since this is no worse than what we are doing now.)

This a good question. I have already played with many variations and my conclusion is that the msun version is the best of all things I have tested and played with considering both accuracy (critical goal of being < 1 ulp) and performance.

Here's the problem with the minmax polynomial approach for the exp function:

on my test suite for Float32
FMA system rational
max 0.86076152 at x = -5.85780 mean 0.03712445

NON-FMA system rational
max 0.86076152 at x = -5.85780 mean 0.03712434

FMA system polynomial
max 0.84459400 at x = -48.8587 mean 0.03627487

NON-FMA system polynomial
max 1.09986448 at x = 5.23060 mean 0.03798720

over the range: vcat(-10:0.0002:10, -1000:0.001:1000, -120:0.0023:1000, -1000:0.02:2000)
(errors may be larger, since we haven't tested all floats)

as you can see it's not possible to be under 1 ulp for non-fma systems using this polynomial (below)

What about speed?
Well I tested 3 hot branches
and the polynomial version on an fma system is ~ 0.5-0.8 ns slower

For reference this is the best minmax polynomial I can find for the Float32

@inline exp_kernel{T<:SmallFloat}(x::T) = @horner_oftype(x, 1.0, 1.0, 0.5, 0.1666666567325592041015625, 4.1666455566883087158203125e-2, 8.333526551723480224609375e-3, 1.39357591979205608367919921875e-3, 1.97799992747604846954345703125e-4)

Note it's not as simple as adding more degrees to this polynomial to improve the accuracy
This is the best minimal minmax polynomial I was able to find that meats < 1 ulp

For Float64

@inline exp_kernel{T<:LargeFloat}(x::T) = @horner_oftype(x, 1.0, 1.0, 0.5, 0.16666666666666685170383743752609007060527801513672, 4.1666666666666692109277647659837384708225727081299e-2, 8.3333333333159547579027659480743750464171171188354e-3, 1.38888888888693412537733706813014578074216842651367e-3, 1.9841269898657093212653024227876130680670030415058e-4, 2.4801587357008890921336585755341275216778740286827e-5, 2.7557232875898009206386968239499424271343741565943e-6, 2.7557245320026768203034231441428403286408865824342e-7, 2.51126540120060271373185023340013355408473216812126e-8, 2.0923712382298872819985862227861600493028504388349e-9)

I haven't tested this in a while but the same conclusions holds

musm · 2017-01-03T23:27:14Z

base/special/exp.jl

+@inline exp_kernel(x::Float32) = @horner(x, 1.6666625440f-1, -2.7667332906f-3)
+
+# custom coefficients (slightly better mean accuracy, but also a bit slower)
+# @inline exp_kernel(x::Float32) = @horner_oftype(x, 0.1666666567325592041015625f0,


I need to edit this to @horner or delete the comment entirely

tkelman · 2017-01-04T21:36:18Z

base/special/exp.jl

@@ -0,0 +1,137 @@
+# Based on FreeBSD lib/msun/src/e_exp.c
+# ====================================================
+# Copyright (C) 2004 by Sun Microsystems, Inc. All rights reserved. Permission


need to add exceptions to the license file and contrib/add_license_to_files.jl

I don't' really know what you mean. What do I need to do?

I looked under here for example:
https://github.com/JuliaLang/openlibm/blob/master/LICENSE.md

This part is kinda relevant

OpenLibm OpenLibm contains code that is covered by various licenses. The OpenLibm code derives from the FreeBSD msun and OpenBSD libm implementations, which in turn derives from FDLIBM 5.3. As a result, it has a number of fixes and updates that have accumulated over the years in msun, and also optimized assembly versions of many functions. These improvements are provided under the BSD and ISC licenses. The msun library also includes work placed under the public domain, which is noted in the individual files. Further work on making a standalone OpenLibm library from msun, as part of the Julia project is covered under the MIT license. The test files, test-double.c and test-float.c are under the LGPL.

Do I just need to add a link to LICENSE.md to openlibm (it's already there) and say that was used as a reference or something?

The Julia license file needs to have an exception here, and the script that adds license headers needs to add an exception for this file. It's derived code so should usually retain its original licensing

Is there an example I can copy to do this?

So I should add this line

- base/special/exp.jl' (see [FREEBSD MSUN](https://github.com/freebsd/freebsd) [FreeBSD/2-clause BSD/Simplified BSD License])

below
The following components of Julia's standard library have separate licenses:

even if it's derived code

# ==================================================== # Copyright (C) 2004 by Sun Microsystems, Inc. All rights reserved. Permission # to use, copy, modify, and distribute this software is freely granted, # provided that this notice is preserved. # ====================================================

do I even need to include this? We are distributing the original file.

Is it necessary to add a line here https://github.com/JuliaLang/julia/blob/master/contrib/add_license_to_files.jl#L39 to skip this?
I'm not sure. Parts of this include ported and derived code?

Copyright sun microsystems, permission to modify and distribute granted provided this notice is preserved. So it's not MIT license, copyright JuliaLang contributors like the majority of the rest of base. We're not distributing the original file in this repo under our MIT license.

Ok. I added the commit to address this : 46058d3

You only did half of it.

Is it necessary to add a line here https://github.com/JuliaLang/julia/blob/master/contrib/add_license_to_files.jl#L39 to skip this?

Yes. "exceptions to the license file and contrib/add_license_to_files.jl"

KristofferC · 2017-01-04T21:40:36Z

It is very cool that it is possible to do this in julia.

musm · 2017-01-04T23:53:32Z

Running nano soldier after JuliaCI/BaseBenchmarks.jl#58 should be an interesting test since there are quite a few other files in BaseBenchmarks that use the exp function

musm · 2017-01-05T21:44:24Z

Ok so the benchmarks have been merged.

musm · 2017-01-07T05:32:49Z

Should we trigger nanosoldier now?

vchuravy · 2017-01-07T09:20:27Z

@nanosoldier runbenchmarks(ALL, vs=":master")

Also this should be squashed before committing it to master.

nanosoldier · 2017-01-07T13:23:24Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

stevengj · 2017-01-07T14:21:04Z

Performance looks good; the regressions are probably noise. But I don't think nanosoldier was running the new exp benchmarks?

musm · 2017-01-08T00:42:26Z

Can we retrigger the benchmarks? They were just uploaded to nanosoldier. Thanks

Sacha0 · 2017-01-08T00:45:17Z

@nanosoldier runbenchmarks(ALL, vs=":master")

tkelman · 2017-01-08T01:00:56Z

base/special/exp.jl

+        # compute approximation
+        z = r*r
+        p = r - z*exp_kernel(z)
+        y = T(1.0) - ((lo - (r*p)/(T(2.0) - p)) - hi) # compute polynomial on reduced arugment


nanosoldier · 2017-01-08T04:49:52Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

musm · 2017-01-08T14:54:51Z

Squashed and g2g on my end. The regressions don't make sense to me since those files don't even use 'exp'. Not sure why we also don't see more improvements as I locally have benchmarked on other branches in exp, but I think this is just some resolution/noise issue on nanosoldier.

musm · 2017-01-09T21:21:54Z

bump

* Add pure julia exp function * Add a few more ldexp tests

musm commented Jan 3, 2017

View reviewed changes

stevengj reviewed Jan 3, 2017

View reviewed changes

JeffBezanson reviewed Jan 3, 2017

View reviewed changes

stevengj reviewed Jan 3, 2017

View reviewed changes

stevengj added the maths Mathematical functions label Jan 3, 2017

musm force-pushed the expfun branch from 500b57d to 157715c Compare January 3, 2017 21:48

stevengj reviewed Jan 3, 2017

View reviewed changes

musm commented Jan 3, 2017

View reviewed changes

stevengj reviewed Jan 3, 2017

View reviewed changes

musm commented Jan 3, 2017

View reviewed changes

musm force-pushed the expfun branch from 1bd08ff to 5e862ae Compare January 4, 2017 21:33

tkelman reviewed Jan 4, 2017

View reviewed changes

musm force-pushed the expfun branch from 46058d3 to f887993 Compare January 4, 2017 22:21

tkelman reviewed Jan 8, 2017

View reviewed changes

musm force-pushed the expfun branch 3 times, most recently from bd4945c to 7323b8f Compare January 8, 2017 05:57

musm added 2 commits January 8, 2017 00:59

Add pure julia exp function

84bf356

Add a few more ldexp tests

fe57cef

musm force-pushed the expfun branch from 7323b8f to fe57cef Compare January 8, 2017 05:59

stevengj merged commit f864b22 into JuliaLang:master Jan 9, 2017

musm deleted the expfun branch January 9, 2017 22:14

simonbyrne mentioned this pull request Nov 16, 2017

WIP: How to speedup exp2 JuliaMath/openlibm#171

Closed

vchuravy pushed a commit to JuliaPackaging/LazyArtifacts.jl that referenced this pull request Oct 2, 2023

Add pure julia exp function (JuliaLang/julia#19831)

6210e41

* Add pure julia exp function * Add a few more ldexp tests


		using Base: sign_mask, exponent_mask, exponent_one, exponent_bias,

		@pure exponent_raw_max{T<:AbstractFloat}(::Type{T}) = Int(exponent_mask(T) >> significand_bits(T))

Add pure julia exp function #19831

Add pure julia exp function #19831

Conversation

musm commented Jan 3, 2017 • edited

musm Jan 3, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

musm commented Jan 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj Jan 3, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

musm Jan 3, 2017 • edited

Choose a reason for hiding this comment

stevengj Jan 3, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj Jan 3, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj commented Jan 3, 2017

stevengj commented Jan 3, 2017

musm commented Jan 3, 2017

musm commented Jan 3, 2017 • edited

stevengj commented Jan 3, 2017 • edited

musm commented Jan 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

musm Jan 3, 2017 • edited

Choose a reason for hiding this comment

musm Jan 3, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KristofferC commented Jan 4, 2017

musm commented Jan 4, 2017 • edited

musm commented Jan 5, 2017 • edited

musm commented Jan 7, 2017

vchuravy commented Jan 7, 2017 • edited

nanosoldier commented Jan 7, 2017

stevengj commented Jan 7, 2017

musm commented Jan 8, 2017

Sacha0 commented Jan 8, 2017

Choose a reason for hiding this comment

nanosoldier commented Jan 8, 2017

musm commented Jan 8, 2017 • edited

musm commented Jan 9, 2017

musm commented Jan 3, 2017 •

edited

musm Jan 3, 2017 •

edited

stevengj Jan 3, 2017 •

edited

musm Jan 3, 2017 •

edited

stevengj Jan 3, 2017 •

edited

stevengj Jan 3, 2017 •

edited

musm commented Jan 3, 2017 •

edited

stevengj commented Jan 3, 2017 •

edited

musm Jan 3, 2017 •

edited

musm Jan 3, 2017 •

edited

musm commented Jan 4, 2017 •

edited

musm commented Jan 5, 2017 •

edited

vchuravy commented Jan 7, 2017 •

edited

musm commented Jan 8, 2017 •

edited